Skip to content

Conversation

@mitruska
Copy link
Contributor

@mitruska mitruska commented Sep 30, 2025

Details:

  • Specification of MOE internal operation
  • Internal ops are used mainly for fusion transformations and optimizations,
    they will not appear in the converted model public IR

Describes MOE used in PR:

Tickets:

  • 171911

@mitruska mitruska requested a review from a team as a code owner September 30, 2025 09:33
@mitruska mitruska requested review from zKulesza and removed request for a team September 30, 2025 09:33
@github-actions github-actions bot added the category: docs OpenVINO documentation label Sep 30, 2025
@mitruska mitruska self-assigned this Sep 30, 2025
Comment on lines +63 to +68
# Experts computation part (GEMM3_SWIGLU)
x_proj = matmul(reshaped_hidden_states, weight_0, transpose_a=False, transpose_b=True)
x_proj2 = matmul(reshaped_hidden_states, weight_1, transpose_a=False, transpose_b=True)
swiglu = swish(x_proj, beta=expert_beta)
x_proj = x_proj2 * swiglu
down_proj = matmul(x_proj, weight_2, transpose_a=False, transpose_b=True)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GPU plugin request is to transpose those weights at conversion stage, so the MatMul both transpose_a/b attrs should be False at this point:

Suggested change
# Experts computation part (GEMM3_SWIGLU)
x_proj = matmul(reshaped_hidden_states, weight_0, transpose_a=False, transpose_b=True)
x_proj2 = matmul(reshaped_hidden_states, weight_1, transpose_a=False, transpose_b=True)
swiglu = swish(x_proj, beta=expert_beta)
x_proj = x_proj2 * swiglu
down_proj = matmul(x_proj, weight_2, transpose_a=False, transpose_b=True)
# Experts computation part (GEMM3_SWIGLU)
x_proj = matmul(reshaped_hidden_states, weight_0, transpose_a=False, transpose_b=False)
x_proj2 = matmul(reshaped_hidden_states, weight_1, transpose_a=False, transpose_b=False)
swiglu = swish(x_proj, beta=expert_beta)
x_proj = x_proj2 * swiglu
down_proj = matmul(x_proj, weight_2, transpose_a=False, transpose_b=False)

cc: @yeonbok

@@ -0,0 +1,151 @@
.. {#openvino_docs_ops_internal_MOE}

MOE
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let us not use MoE name because we can use it for external operation and for real MoE operation. Now it is a sort of FusedExperts.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The routing weights and indices are provided as inputs, so the core MOE idea is preserved, final multiplication and ReduceSum are included.
I would keep the name as is, to make current purpose clear.
The MOE internal op can be refactored as needed in the future, also possibly extended with Router.

.. code-block:: py
:force:

# Common part: Reshape hidden states and prepare for expert computation
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I propose to add router_topk_output_indices into this logic. It will show how weights are prepared. Now it is not clear how router_topk_output_indices is used in the specified operation.

Copy link
Collaborator

@rkazants rkazants left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job! Thank you, Kasia. Left a couple of comments,

github-merge-queue bot pushed a commit that referenced this pull request Oct 16, 2025
… experts into MOE (#32183)

### Details:
This transformation is for compile time and is not enabled by default,
it should be enabled in each plugin with MOE plugin support.
Example registration of the fusion transformation for CPU plugin:
41145cf

- Fuse vectorized MatMul experts into MOE for 3GEMMs and 2GEMMs pattern:
```
class ov::pass::VectorizedExpertsFusion : public ov::pass::GraphRewrite {
public:
    OPENVINO_GRAPH_REWRITE_RTTI("VectorizedExpertsFusion");
    VectorizedExpertsFusion() {
        add_matcher<ov::pass::FuseVectorizedMOE2GEMM>();
        add_matcher<ov::pass::FuseVectorizedMOE3GEMM>();
    }
};
```
 - Add internal MOE op
 


MOE internal op spec PR:
- #32255

## Preliminary requirements (offline transformations):
- Patterns match MatMul (transpose_a=False, transpose_b=**True**), for
batched MatMuls preliminary update of MatMulConstTransposesExtraction is
needed:
   - #32378 

- Fusion of separate MatMul experts into vectorized (batched) MatMul:
   - #32199
 
### Tickets:
 - transformation (and fusion details): 173663, op: 171913
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

category: docs OpenVINO documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants